Music in Advertising Videos

A study by two researchers from Hungary, Monica Coronel and Anna Irimiás, highlights how music plays a vital role in destination promotional videos, shaping both how viewers think and feel. Beyond simply capturing attention or creating atmosphere, music reflects a destination’s identity, evokes emotion, and connects with specific audiences on a deeper, cultural level.

These findings inspired me to explore Vietnamese advertising music, particularly in understanding how its musical structure contributes to the emotional and cultural impact of advertisements. In this portfolio, I will investigate the musical characteristics and structural elements commonly found in Vietnamese advertising music. My central research question is:

“What are the musical styles and structural features of Vietnamese advertising music?”

To represent Vietnamese advertising music, I selected two tracks suitable for advertising videos showcasing Vietnamese culture and nature. After experimenting with generative AI tools, I opted for royalty-free tracks from Pixabay and SoundCloud. I used keywords such as “Vietnam,” “folk instruments,” “adventurous music,” and “travel” on both platforms, and filtered for “bright” mood and “cinematic music” theme on Pixabay. I chose these tracks as they feature Vietnamese folk instruments — a key focus — and include a strong bass, which I think would enhance engagement and evoke emotions in listeners, aligning well with the commercial and storytelling purposes of advertising videos.

Here are my two tracks:

Overview (track-level)

Row 1

Overall

Row 2

Description

This interactive boxplot presents the distribution of various Essentia features extracted from the class corpus. The black points represent all tracks in the dataset, while my tracks are highlighted in pink for better visibility.

The two selected tracks reflect two different musical directions within Vietnamese advertising music:

lesley-n-1 stands out for its exceptionally high approachability, making it instantly accessible and friendly - an essential trait for drawing in diverse audiences. However, its low instrumentalness indicates a minimal presence of traditional instruments, which may limit its cultural specificity.

Meanwhile, lesley-n-2 leans toward a more culturally expressive profile, with above-average instrumentalness, likely reflecting folk instrumentation - one of the core traits in Vietnamese promotional media.

Interestingly, both tracks fall below the corpus mean in engagingness, suggesting a lack of dynamism or emotional activation, which may reduce their impact in sustaining audience attention. In particular, lesley-n-2 also scores notably low in both arousal and valence, suggesting a more subdued emotional tone. In contrast, danceability and tempo for both tracks remain close to the corpus median, maintaining rhythmic consistency. Altogether, these characteristics shape a calm and stable musical texture—less dynamic, but well-suited for a supporting role in storytelling within advertising videos.

Chromagrams

Row 1

Track 1

Row 2

Track 2

Row 3

Description

The chromagrams provide a snapshot of pitch class usage in Vietnamese advertising music:

Track 1 shows strong concentration in D, E, F, G, and A, forming a pentatonic-like scale closely associated with traditional Vietnamese folk music. The absence of diverse chromatic tones contributes to a stable and grounded harmonic palette, supporting a grounded, calm sonic environment — ideal as a subtle background in storytelling-focused advertisements.

Track 2 reveals consistent and strong chromatic activity in D, E, F♯, G, and A, forming a distinct D major or D-based pentatonic scale. This structure also reflects Southeast Asian tonalities but with slightly more brightness and motion with very light traces of C, F, and B pitch classes, suggesting a more evolving harmonic texture.

Both tracks rely on pentatonic frameworks, showing that Vietnamese advertising music often draws from folk-inspired tonal material to convey cultural identity.

Chord and Key Estimation

Row 1

Track 1 - Chordogram

Track 1 - Keygram

Row 2

Track 2 - Chordogram

Track 2 - Keygram

Row 3

Chordogram

These chordograms visualize the harmonic structure of Track 1 and 2:

Track 1

The chordogram for Track 1 reflects a stable and harmonically grounded structure with the focus on less conventional chords such as G♭:maj, A♭:maj, E♭:maj, G♯:min, B:maj, and D♯:min. These chords lie outside the standard diatonic palette, hinting at either modal ambiguity, borrowed chords, or a non-Western tuning approach. The absence of strong transitions or modulations points toward a repetitive structure and a consistent tonal center, which supports the function of the track as a calm backdrop.

Track 2

In contrast, Track 2 exhibits greater harmonic variability and dynamic chord progression. The distribution is much denser and more active across the entire vertical axis, such as mid-lower range: noticeable intensities in F:min, A♭:maj, D:min, E:min, G:min, and minor and 7th chords are frequent, suggesting richer harmonic color. Moreover, the chordogram reveals clear shifts in harmonic intensity at around 60s, 100s, 160s, and 190s, pointing to sectional modulations. This type of progression allows the music to integrate into multi-scene commercials or cinematic storytelling, where music needs to evolve with the visual narrative.

Keygram

The keygrams reveal tonal changes that provide insight into modulation patterns and instrumental improvisation across both tracks:

Track 1

Track 1 displays high tonal consistency throughout most of its duration, which aligns with the chordogram’s harmonic stability. However, subtle key shifts occur around the 10-second mark, coinciding with the entrance of drums in the opening, and again near the 200-second mark, where the track transitions to a softer texture with only zither and piano. These moments of shift could represent musical framing devices, an intro and outro, helping to sonically define the start and finish while keeping the core tonal center stable.

Track 2

In contrast, Track 2 reveals greater key mobility and fluctuating salience intensities across time. Alternating vertical bands of brightness and darkness represent dynamic modulations or shifts in tonal emphasis. For example, the darker-toned segments (0–60s and 110–150s) align with deeper, more ambient passages featuring subdued percussion and warm string textures. Meanwhile, the brighter intervals (60–110s and post-150s) suggest instrumental improvisation driven by high-pitched flutes and electronic embellishments that bring an evolution in harmonic structure.

Overally, Vietnamese advertising music — based on these keygrams — tends to preserve a central tonal identity while integrating predictable modulations or localized key shifts. These tonal movements, visualized through chroma-based key salience, strike a balance between coherence and expressive flexibility.

Cepstrograms

Row 1

Track 1

Row 2

Track 2

Row 3

Description

The cepstrogram analysis offers a deeper look into the timbral landscape of both tracks:

Track 1

While Track 1 maintains an overall timbral consistency in the higher coefficients (3 and above), there are noticeable changes at specific time points such as 10s, 40s, 70s, 120s, 125s, 160s and 190s. These momentary shifts suggest transient structural and instrumental events.

More notably, when observing the lower-order coefficients — especially coefficients 0 and 1, which represent overall energy and spectral shape — there are distinct changes in color and intensity during 70s to 110s and again between 160s and 190s. These segments likely correspond to the use of transitional sound effects and the introduction of organ-like timbres, which slightly disturb the otherwise smooth texture. Such elements might serve to subtly signal new scenes and guide emotional flow in an advertisement, without disrupting the central tonal grounding of the track.

Track 2

Compared to Track 1, Track 2 exhibits more dynamic changes in timbre over time across almost all coefficients, indicating a more expressive and evolving sonic texture. In the higher-order coefficients (3 and above), there are alternating bands of darker and brighter orange colors, pointing to fluctuating spectral complexity. These visual contrasts align closely with the keygram analysis, where I noted tonal shifts and evolving harmonic intensities, further reinforcing the track’s structural variability.

A particularly striking feature appears in coefficient 1, where a distinct and consistently bright orange band emerges around the midpoint of the track. This moment corresponds with a transition between two major sections, marked by the entrance of a solo drum motif.

These findings suggest that timbral variation — whether minimal and steady as in Track 1 or more dynamic and contrastive as in Track 2 — plays a crucial role in shaping the emotional pacing and storytelling function of Vietnamese advertising music.

Chroma and Timbre Features

Row 1

Track 1

Row 2

Track 2

Row 3

Track 1

Chroma-based Self-Similarity Matrix

In the chroma-based matrix, we observe a dense distribution of small block-like patterns, indicating high internal harmonic consistency. This aligns with previous chromagram and chord analysis, which identified a folk-inspired pentatonic structure and a lack of modulation. Interestingly, there are also faint, sometimes blurred diagonal paths parallel to the main diagonal, suggesting the repetition of harmonic material at regular intervals. Although these repetitions are not sharply defined, they hint at recurring structural motifs, possibly through soft reintroductions of earlier harmonic ideas rather than exact repeats.

Timbre-based Self-Similarity Matrix

The timbre matrix has softly defined block areas, especially around 70s, 100s, and 160s. TThese sections reflect homogeneity in instrumental texture at those moments, such as the inclusion of organ-like timbres and atmospheric sound effects, which were also observed in the cepstrogram. While these textural changes are present, they remain gentle and integrated, avoiding sharp contrasts.

Track 2

Chroma-based Self-Similarity Matrix

Chroma-based matrix of Track 2 reveals very clear large blocks separated by distinct changes around 60s, 110s, and 165s. These segments exhibit strong internal similarity, suggesting that each section is harmonically cohesive, while the contrast between blocks points to intentional modulation. Faint diagonal paths across the matrix imply the reappearance of harmonic motifs throughout the track, possibly repeated themes that evolve over time.

Timbre-based Self-Similarity Matrix

The timbre-based matrix shows strong block segmentation — a sign of clear instrumental and textural shifts across sections. What stands out most are the lighter horizontal and vertical lines intersecting around the center (at around 110s). These bright lines indicate a drop in self-similarity at that point, highlighting a transitional moment where timbral texture noticeably changes. This could correspond to a dramatic shift in instrumentation corresponding with a solo break of drum perceived in cepstrogram, separating two major parts of the track.

Temporal features

Row 1

Track 1 - Novelty Function

Track 1 - Tempogram

Row 2

Track 2 - Novelty Function

Track 2 - Tempogram

Row 3

Novelty Function

Track 1

The spectral-based novelty function for Track 1 reflects a textural landscape that is smooth yet gently evolving. While there are no sharp or dramatic peaks, the function displays frequent moderate fluctuations, indicating a continuous stream of timbral and structural changes without disruption. Novelty values mostly stay under 0.20, with clusters of slightly elevated peaks around 10s, 40s, 110s–125s, and 140s. This corresponds closely with the timbral shifts and sectional transitions previously observed in the cepstrogram and timbre-based self-similarity matrix with instrumental layering, transitional sound effects.

Track 2

In contrast, the spectral novelty function of Track 2 reveals a highly segmented and event-driven structure, with three clear peaks marking significant moments in the track: at the very beginning, at 110s, and near the end (around 200s). These peaks indicate sharp spectral changes, often associated with transitions between major sections or the introduction of contrasting timbral material. The initial peak corresponds with the entrance of strong percussion, setting the tone and energy for the opening of piece. The central peak at 110s aligns with the solo drum break observed in both the cepstrogram and self-similarity matrices, which serves as a structural turning point that divides the track into two contrasting halves. Finally, the late peak around 200s marks a final transition, closing the piece with a shift in musical direction.

Overall, Vietnamese advertising music often balances textural stability with sectional contrast, using spectral shifts to shape emotional flow and align with the pacing of visual narratives.

Tempogram

Track 1

The cyclic tempogram of Track 1 reveals a strong sense of tempo stability centered around 118 BPM. Interestingly, the track begins with a slightly slower tempo (around 90 BPM), gradually accelerating and then stabilizing near 118 BPM. Toward the end of the piece (from ~190s onward), there is a slight deceleration, suggesting a structural cue for closure. These slight tempo modulations contribute to a musical contour that reinforces formal structure, subtly marking the entrance, development, and resolution of the track.

Track 2

The cyclic tempogram of Track 2 clearly reveals a stable fundamental tempo centered around 100 BPM. Meanwhile, the fainter line around 140 BPM likely corresponds to a tempo-related harmonic, reflecting rhythmic embellishments such as faster decorative percussion. Notably, the tempogram also displays a series of recurring vertical light lines at specific time points — particularly around 5s, 40s, 110s, and 140s. s. These indicate localised changes in beat salience and transient rhythmic disruptions, corresponding to musical transitions that can also be perceived aurally.

Based on this tempogram analysis of these tracks, Vietnamese advertising background music appears characterized by clear, stable fundamental tempo structures, often accompanied by identifiable harmonic patterns. The rhythmic consistency observed suggests suitability for creating comfortable listening experiences while viewing nature and culture presented in video, essential in promotional contexts. While there are still visible changes presented in tempograms at certain time-points, these are not such huge changes and these changes seem to be mainly due to the changes in instrumentation when carefully listening to the tracks.

Clustering

Row 1

Dendrogram

Heatmap


Row 2

Dendrogram

The dendrogram visually represents how similar or different tracks are based on musical-emotional features: tempo, arousal, valence and instrumentalness.

About the class corpus:

In the upper layers of the dendrogram (levels 1, 2, and 3), there is a noticeable leftward asymmetry, where a number of tracks branch off earlier and do not merge with the larger group until much later. This left-leaning structure suggests that several tracks — particularly those on the far left — are distinct from the general musical tendencies of the class corpus. These tracks likely deviate more strongly in one or more features such as tempo, arousal, valence, or instrumentalness.

For example, tracks such as berend-b-2, aleksandra-b-1, and reinout-w-2 appear on the outermost left branch. These compositions exhibit qualities typical of high-energy genres like EDM and disco, characterized by electronic timbres, strong rhythmic drive, high arousal, and fast tempos. Their early separation from the rest of the corpus reflects their greater dissimilarity in both emotional tone and production style.

This assymmetrical structure overall highlights the diversity of class corpus.

About my tracks:

Both of my tracks are located within one of the large central clusters at level 4, alongside many other tracks in the class corpus. However, when examining the deeper levels of the dendrogram, the two tracks diverge significantly — each falling into different lower-level subclusters, indicating distinct musical-emotional profiles.

  • lesley-n-1 appears on the far right of the dendrogram and is clustered with tracks like wednesday-w-2, desmond-l-1, daniel-p-1, sanne-o-2, and ties-o-1/2.

  • lesley-n-2, by contrast, is situated in the middle-left region and grouped with more dynamic and segmented tracks like bram-d-1, daniel-p-2, and sarya-n-2.

This separation supports earlier analytical findings that, while both tracks are suitable for Vietnamese advertising, they represent distinct stylistic features.

Heatmap

These heatmaps specifically isolate the two clusters of neighboring tracks closest to my selected tracks. Each subcluster is characterized by different relationships among the four features.

Track 1

When listening to the other track neighbours and comparing it to my first track (lesley-n-1), I initially noticed very little similarity — if any at all. Those feel more electronic in nature, with a higher perceived arousal level and fewer acoustic elements. In contrast, my own track is rich in instrumental textures, prominently featuring Vietnamese zither, drums, and other traditional folk instruments. Based on this, I expected a clear difference in instrumentalness. However, the heatmap reveals a surprising result: both my track and its cluster neighbors share relatively similar and unexpectedly low z-scored instrumentalness values, with my track displaying one of the lowest scores in the cluster.

This raises an important question: Can current music information retrieval systems accurately detect and interpret non-Western traditional instruments? The fact that my track — which is clearly instrumental to the human ear — scores low in instrumentalness suggests that models trained primarily on Western music datasets may fail to recognize or appropriately weight traditional Vietnamese timbres.

Track 2

When listening to bram-d-1, daniel-p-2, and sarya-n-2 and comparing with mine, I noticed that all four share a similar genre and mood — lightweight, atmospheric tracks that resemble background or ambient music, seem to be used in cinematic or commercial settings. The heatmap confirms this sonic similarity: these tracks all exhibit very high levels of instrumentalness, suggesting a focus on instrumental textures. In contrast, they share consistently low — even negative — values for tempo, valence, and arousal, meaning they are slower, more emotionally neutral or subdued, and less intense in terms of energetic expression.

Classification

Row 1

k-Nearest Neighbour Result

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI          0.66  0.673
2 Non-AI      0.6   0.585

Mosaic

Heatmap

Row 2

Random Forest Result

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI         0.681  0.653
2 Non-AI     0.605  0.634

Importance Variables

AI vs non-AI

Row 3

Description

I examined two classification methods to distinguish between AI and non-AI generated tracks in the class corpus: k-Nearest Neighbour and Random Forest, both using all available features. The results show that Random Forest performs slightly better, achieving a precision of 0.688, compared to 0.66 for k-Nearest Neighbour.

Using the Random Forest model, the three most important features for classification were instrumentalness, arousal, and danceability. A scatter plot based on these features reveals substantial overlap between AI and non-AI tracks, indicating that the two categories are not easily separable. However, a general trend emerges: AI-generated tracks (purple) tend to cluster around lower to mid values of instrumentalness and danceability, whereas non-AI tracks (yellow) are more widely distributed, including those with higher values in both dimensions. This suggests that, in the class corpus, AI-generated music tends to be less instrumental and rhythmically engaging, potentially reflecting limitations in expressive complexity compared to human-generated music.

Conclusion

This portfolio has explored the musical and structural features of Vietnamese advertising music through a detailed analysis of two selected tracks. Despite representing different stylistic directions — one grounded and harmonically static, the other dynamic and segmentally expressive — both tracks share key traits that reflect broader tendencies in Vietnamese promotional media.

Across all dimensions analyzed — harmonic structure, key modulation, timbre, novelty, and tempo — Vietnamese advertising music appears to favor coherence, cultural resonance, and emotional subtlety. Tonal centers remain stable, tempo is consistent, and timbral shifts are used thoughtfully to support transitions and narrative development. This aligns well with the function of background music in advertising: to support visual storytelling without overwhelming it, while still evoking emotional and cultural depth.

Altogether, this analysis suggests that Vietnamese advertising music balances stability and variation, local identity and global accessibility, crafting sonic experiences that reinforce the beauty, rhythm, and emotion of Vietnam’s visual storytelling.

Thank you for reading my portfolio!